Towards Learning Generalizable Code Embeddings Using Task-agnostic Graph Convolutional Networks

نویسندگان

چکیده

Code embeddings have seen increasing applications in software engineering (SE) research and practice recently. Despite the advances embedding techniques applied SE research, one of main challenges is their generalizability. A recent study finds that code may not be readily leveraged for downstream tasks are particularly trained for. Therefore, this article, we propose GraphCodeVec , which represents source as graphs leverages Graph Convolutional Networks to learn more generalizable a task-agnostic manner. The edges graph representation automatically constructed from paths abstract syntax trees, nodes tokens code. To evaluate effectiveness consider three benchmark (i.e., comment generation, authorship identification, clones detection) used prior benchmarking add new classification, logging statements prediction, defect prediction), resulting total six considered our evaluation. For each task, apply learned by four baseline approaches compare respective performance. We find outperforms all baselines five out tasks, its performance relatively stable across different datasets. In addition, perform ablation experiments understand impacts training context extracted trees) model Networks) on generated embeddings. results show both can benefit producing high-quality while improvement robust Our findings suggest future using graph-based deep learning methods capture structural information tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Generalizable Sentence Embeddings

In this work, we evaluate different sentence encoders with emphasis on examining their embedding spaces. Specifically, we hypothesize that a “high-quality” embedding aids in generalization, promoting transfer learning as well as zero-shot and one-shot learning. To investigate this, we modify Skipthought vectors to learn a more generalizable space by exploiting a small amount of supervision. The...

متن کامل

Convolutional 2D Knowledge Graph Embeddings

Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow, fast models which can scale to large knowledge graphs. However, these models learn less expressive features than deep, multi-layer models – which potentially limits performance. In this work we introduce ConvE, a multi-layer convolutiona...

متن کامل

Towards Language-Agnostic Mobile Code

The Java Virtual Machine is primarily designed for transporting Java programs. As a consequence, when JVM bytecodes are used to transport programs in other languages, the result becomes less acceptable the more the source language diverges from Java. Microsoft’s .NET transport format fares better in this respect because it has a more flexible type system and instruction set, but it is not exten...

متن کامل

Towards Incremental Learning with Deep Convolutional Networks

Deep neural networks are a powerful class of machine learning models. However they require a lot of time and computational resources to train. We propose to apply an incremental learning approach to train models by utilizing the information present in pre-trained models. We build towards this goal by studying the relationship between network architecture, categories in training data, the amount...

متن کامل

Protein Interface Prediction using Graph Convolutional Networks

We consider the prediction of interfaces between proteins, a challenging problem with important applications in drug discovery and design, and examine the performance of existing and newly proposed spatial graph convolution operators for this task. By performing convolution over a local neighborhood of a node of interest, we are able to stack multiple layers of convolution and learn effective l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Software Engineering and Methodology

سال: 2023

ISSN: ['1049-331X', '1557-7392']

DOI: https://doi.org/10.1145/3542944